{
  "cells": [
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "zcWElZZA7rzX"
      },
      "source": [
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/search/question-answering/table-qa.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/search/question-answering/table-qa.ipynb)\n",
        "\n",
        "# Table Question Answering with Pinecone"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JZqviV15sO31"
      },
      "source": [
        "Table Question Answering (Table QA) refers to providing precise answers from tables to answer a user's question. With recent works on Table QA, is it now possible to answer natural language queries from tabular data. This notebook demonstrates how you can build a Table QA system that can answer your natural language queries using the Pinecone vector database. \n",
        "\n",
        "We need three main components to build the Table QA system:\n",
        "\n",
        "- A vector index to store table embeddings\n",
        "- A retriever model for embedding queries and tables\n",
        "- A reader model to read the tables and extract answers"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-h-MW6FJxTtT"
      },
      "source": [
        "# Install Dependencies"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "OXXxZ_9q75RH"
      },
      "outputs": [],
      "source": [
        "# torch-scatter may take few minutes to install\n",
        "!pip install datasets pinecone-client sentence_transformers torch-scatter"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fyqzTw5RBeEa"
      },
      "source": [
        "# Load the Dataset"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-ouitEo-0XQC"
      },
      "source": [
        "We will work with a subset of the Open Table-and-Text Question Answering ([OTT-QA](https://github.com/wenhuchen/OTT-QA)) dataset, consisting of texts and tables from Wikipedia. The subset contains 20,000 tables, and it can be loaded from the Huggigface Datasets hub as follows:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "kKf3UJJFlM2R",
        "outputId": "ea39017c-86ea-454d-b155-8c203acecf83"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "WARNING:datasets.builder:Using custom data configuration ashraq--ott-qa-20k-9a5d7971907a3e67\n",
            "WARNING:datasets.builder:Found cached dataset parquet (/root/.cache/huggingface/datasets/ashraq___parquet/ashraq--ott-qa-20k-9a5d7971907a3e67/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "Dataset({\n",
              "    features: ['url', 'title', 'header', 'data', 'section_title', 'section_text', 'uid', 'intro'],\n",
              "    num_rows: 20000\n",
              "})"
            ]
          },
          "execution_count": 1,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from datasets import load_dataset\n",
        "\n",
        "# load the dataset from huggingface datasets hub\n",
        "data = load_dataset(\"ashraq/ott-qa-20k\", split=\"train\")\n",
        "data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Dv_CU6hs1Yd6",
        "outputId": "5b0d52a3-f138-4328-dafc-a54d186e4ef7"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'url': 'https://en.wikipedia.org/wiki/1976_New_York_Mets_season',\n",
              " 'title': '1976 New York Mets season',\n",
              " 'header': ['Level', 'Team', 'League', 'Manager'],\n",
              " 'data': [['AAA', 'Tidewater Tides', 'International League', 'Tom Burgess'],\n",
              "  ['AA', 'Jackson Mets', 'Texas League', 'John Antonelli'],\n",
              "  ['A', 'Lynchburg Mets', 'Carolina League', 'Jack Aker'],\n",
              "  ['A', 'Wausau Mets', 'Midwest League', 'Bill Monbouquette'],\n",
              "  ['Rookie', 'Marion Mets', 'Appalachian League', 'Al Jackson']],\n",
              " 'section_title': 'Farm system',\n",
              " 'section_text': 'See also : Minor League Baseball',\n",
              " 'uid': '1976_New_York_Mets_season_7',\n",
              " 'intro': 'The New York Mets season was the 15th regular season for the Mets, who played home games at Shea Stadium. Led by manager Joe Frazier, the team had an 86-76 record and finished in third place in the National League East.'}"
            ]
          },
          "execution_count": 2,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "data[2]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Wz0Z6qF_EUzM"
      },
      "source": [
        "As we can see, the dataset includes both textual and tabular data that are related to one another. Let's extract and transform the dataset's tables into pandas dataframes as we will only be using the tables in this example."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "id": "BmoFdpptEXHO"
      },
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "\n",
        "# store all tables in the tables list\n",
        "tables = []\n",
        "# loop through the dataset and convert tabular data to pandas dataframes\n",
        "for doc in data:\n",
        "    table = pd.DataFrame(doc[\"data\"], columns=doc[\"header\"])\n",
        "    tables.append(table)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "c56PfwR4F1cZ",
        "outputId": "5cebb3d9-754d-4dc2-b1b5-3536552b3907"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-5746ac5e-ff5f-42a9-9fb9-48b8dd8bebd9\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Level</th>\n",
              "      <th>Team</th>\n",
              "      <th>League</th>\n",
              "      <th>Manager</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>AAA</td>\n",
              "      <td>Tidewater Tides</td>\n",
              "      <td>International League</td>\n",
              "      <td>Tom Burgess</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>AA</td>\n",
              "      <td>Jackson Mets</td>\n",
              "      <td>Texas League</td>\n",
              "      <td>John Antonelli</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>A</td>\n",
              "      <td>Lynchburg Mets</td>\n",
              "      <td>Carolina League</td>\n",
              "      <td>Jack Aker</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>A</td>\n",
              "      <td>Wausau Mets</td>\n",
              "      <td>Midwest League</td>\n",
              "      <td>Bill Monbouquette</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>Rookie</td>\n",
              "      <td>Marion Mets</td>\n",
              "      <td>Appalachian League</td>\n",
              "      <td>Al Jackson</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-5746ac5e-ff5f-42a9-9fb9-48b8dd8bebd9')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-5746ac5e-ff5f-42a9-9fb9-48b8dd8bebd9 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-5746ac5e-ff5f-42a9-9fb9-48b8dd8bebd9');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "    Level             Team                League            Manager\n",
              "0     AAA  Tidewater Tides  International League        Tom Burgess\n",
              "1      AA     Jackson Mets          Texas League     John Antonelli\n",
              "2       A   Lynchburg Mets       Carolina League          Jack Aker\n",
              "3       A      Wausau Mets        Midwest League  Bill Monbouquette\n",
              "4  Rookie      Marion Mets    Appalachian League         Al Jackson"
            ]
          },
          "execution_count": 4,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "tables[2]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0n61NMn9-HZv"
      },
      "source": [
        "# Initialize Retriever"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DNfBXFk4Arh4"
      },
      "source": [
        "The retriever transforms natural language queries and tabular data into embeddings/vectors. It will generate embeddings in a way that the natural language questions and tables containing answers to our questions are nearby in the vector space.\n",
        "\n",
        "We will use a SentenceTransformer model trained specifically for embedding tabular data for retrieval tasks. The model can be loaded from the Huggingface Models hub as follows:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 208,
          "referenced_widgets": [
            "872351e5e09f4a2b9f1c23b1316b6a36",
            "f9194e661d484dc6ba249100ad7792da",
            "b25799ff3cb94fada82e7a64b708a377",
            "409092d8468c4f82bc2b6cf2f0b48dd9",
            "0398823b05ac46b3bb0d5918698e92aa",
            "a36059ff9de34279b633d26231da05de",
            "a186c7e967744c4d8d48778c36cdc08d",
            "0806e30586ef45e1b5c6de273d4b820e",
            "972f7c08489b46b6bf74bd4bcf0ecaa9",
            "a07f6ff6e9464b54aeda12df8232f6ac",
            "e8f14835a2324e56bb5d4c862767c602"
          ]
        },
        "id": "cfonlEiX-KQA",
        "outputId": "ffa1c02b-79d7-4360-938c-a58955384d29"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Moving 0 files to the new cache system\n"
          ]
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "872351e5e09f4a2b9f1c23b1316b6a36",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "0it [00:00, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "SentenceTransformer(\n",
              "  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel \n",
              "  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})\n",
              "  (2): Normalize()\n",
              ")"
            ]
          },
          "execution_count": 5,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import torch\n",
        "from sentence_transformers import SentenceTransformer\n",
        "\n",
        "# set device to GPU if available\n",
        "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
        "# load the table embedding model from huggingface models hub\n",
        "retriever = SentenceTransformer(\"deepset/all-mpnet-base-v2-table\", device=device)\n",
        "retriever"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gypflS135_Y4"
      },
      "source": [
        "The retriever expects tables to be in a particular format. Let's write a function to convert the tables to this format."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "id": "9-MqljKaAarE"
      },
      "outputs": [],
      "source": [
        "def _preprocess_tables(tables: list):\n",
        "    processed = []\n",
        "    # loop through all tables\n",
        "    for table in tables:\n",
        "        # convert the table to csv and \n",
        "        processed_table = \"\\n\".join([table.to_csv(index=False)])\n",
        "        # add the processed table to processed list\n",
        "        processed.append(processed_table)\n",
        "    return processed\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "t4T_eMtEHlw9"
      },
      "source": [
        "Notice that we are only using tables here. However, if you want the retriever to take the metadata into account while retrieving the tables, you can join any metadata strings, such as title, section_title, etc., separated by new line characters at the beginning of the processed table."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KIdluqrZIH0X"
      },
      "source": [
        "Let's take a look at the formatted tables."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 53
        },
        "id": "4NTu2-qqAbk2",
        "outputId": "e4bee59c-f90c-4e38-ce24-06a769f8b402"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "'Level,Team,League,Manager\\nAAA,Tidewater Tides,International League,Tom Burgess\\nAA,Jackson Mets,Texas League,John Antonelli\\nA,Lynchburg Mets,Carolina League,Jack Aker\\nA,Wausau Mets,Midwest League,Bill Monbouquette\\nRookie,Marion Mets,Appalachian League,Al Jackson\\n'"
            ]
          },
          "execution_count": 7,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# format all the dataframes in the tables list\n",
        "processed_tables = _preprocess_tables(tables)\n",
        "# display the formatted table\n",
        "processed_tables[2]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FI8TLf5CJFeq"
      },
      "source": [
        "The formatted table may not make sense to us, but the embedding model is trained to understand it and generate accurate embeddings."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FpUcVljhCIaO"
      },
      "source": [
        "# Initialize Pinecone Index"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "B8RzP4snKuCW"
      },
      "source": [
        "We will use the Pinecone vector database as our vector index. The Pinecone index stores vector representations of our tables which we can retrieve using a natural language query (query vector). Pinecone does this by computing the similarity between the query vector and the embedded tables stored in the vector index. \n",
        "\n",
        "To use Pinecone, we first need to initialize a connection to Pinecone. For this, we need a [free API key](https://app.pinecone.io/), and then we initialize the connection like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 61,
      "metadata": {
        "id": "_huF14A0ASsL"
      },
      "outputs": [],
      "source": [
        "from pinecone import Pinecone\n",
        "\n",
        "# connect to pinecone environment\n",
        "pinecone.init(\n",
        "    api_key=\"YOUR API KEY\",\n",
        "    environment=\"YOUR_ENV\"  # find next to API key in console\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TWqgjlaROvHX"
      },
      "source": [
        "Now we create a new index. We specify the metric type as \"cosine\" and dimension as 768 because the retriever we use to generate context embeddings outputs 768-dimension vectors. Pinecone will use cosine similarity to compute the similarity between the query and table embeddings."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 62,
      "metadata": {
        "id": "NBOKt54UCbJC"
      },
      "outputs": [],
      "source": [
        "# you can choose any name for the index\n",
        "index_name = \"table-qa\"\n",
        "\n",
        "# check if the table-qa index exists\n",
        "if index_name not in pinecone.list_indexes().names():\n",
        "    # create the index if it does not exist\n",
        "    pinecone.create_index(\n",
        "        index_name,\n",
        "        dimension=768,\n",
        "        metric=\"cosine\"\n",
        "    )\n",
        "\n",
        "# connect to table-qa index we created\n",
        "index = pinecone.Index(index_name)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RnIu8g2rBlWB"
      },
      "source": [
        "# Generate Embeddings and Upsert"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ofe-rqwBQIii"
      },
      "source": [
        "\n",
        "Next we need to generate the table embeddings and upload it to the Pinecone index. We can easily do that as follows:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 118,
          "referenced_widgets": [
            "9283fd1259da4411b22315e66f0fde4b",
            "82419e00d85542d4a40ba725ac938820",
            "e141a30db58d4fedb680cfdbb6156fe9",
            "527d87292efc4589b652841afbd4845c",
            "8e6f046688664ea2a8b4ad91067cb0a4",
            "4556db45bd464c189e5b771cb0773480",
            "f91627aef80646c38455fe7b31ed9d40",
            "56dafa49175849e29a68782c43735b9e",
            "571b97905d8c49498a4f5bc6bcfcf573",
            "0dcc27dac0a34f0597bee867b7a0c1f7",
            "2f88a69a0e1443ff9f91a4054d8e751b"
          ]
        },
        "id": "e-eGf669BxIa",
        "outputId": "3b4da09e-c38d-4a62-f2ef-6dbfc9d15f2f"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "9283fd1259da4411b22315e66f0fde4b",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "  0%|          | 0/313 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "{'dimension': 768,\n",
              " 'index_fullness': 0.0,\n",
              " 'namespaces': {'': {'vector_count': 20000}},\n",
              " 'total_vector_count': 20000}"
            ]
          },
          "execution_count": 39,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from tqdm.auto import tqdm\n",
        "\n",
        "# we will use batches of 64\n",
        "batch_size = 64\n",
        "\n",
        "for i in tqdm(range(0, len(processed_tables), batch_size)):\n",
        "    # find end of batch\n",
        "    i_end = min(i+batch_size, len(processed_tables))\n",
        "    # extract batch\n",
        "    batch = processed_tables[i:i_end]\n",
        "    # generate embeddings for batch\n",
        "    emb = retriever.encode(batch).tolist()\n",
        "    # create unique IDs ranging from zero to the total number of tables in the dataset\n",
        "    ids = [f\"{idx}\" for idx in range(i, i_end)]\n",
        "    # add all to upsert list\n",
        "    to_upsert = list(zip(ids, emb))\n",
        "    # upsert/insert these records to pinecone\n",
        "    _ = index.upsert(vectors=to_upsert)\n",
        "\n",
        "# check that we have all vectors in index\n",
        "index.describe_index_stats()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LiHFszRrnRyC"
      },
      "source": [
        "Now the Pinecone index is ready for querying. Let's test to see if it returns tables relevant to our queries."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "7jsPDZq8ox9P",
        "outputId": "8b3e2e75-8837-46c9-aaac-7b6b37dead07"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'matches': [{'id': '19931', 'score': 0.822087, 'values': []}], 'namespace': ''}"
            ]
          },
          "execution_count": 10,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "query = \"which country has the highest GDP in 2020?\"\n",
        "# generate embedding for the query\n",
        "xq = retriever.encode([query]).tolist()\n",
        "# query pinecone index to find the table containing answer to the query\n",
        "result = index.query(vector=xq, top_k=1)\n",
        "result\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sx8zgRSYrVB8"
      },
      "source": [
        "The Pinecone index has returned the ```id``` of a table that would contain the answer to our query with 82.2% confidence. Let's see if this table actually contains the answer. We can use the returned ```id``` as an index to get the relevant pandas dataframe from the ```tables``` list."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "K-T9yXZtpb2h",
        "outputId": "6ac4c51e-c1f3-4565-83a4-df7ceb409f4f"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-74ed8f27-1770-4db1-9983-2633ef78b0c5\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Rank</th>\n",
              "      <th>Country</th>\n",
              "      <th>GDP ( PPP , Peak Year ) millions of USD</th>\n",
              "      <th>Peak Year</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1</td>\n",
              "      <td>China</td>\n",
              "      <td>27,804,953</td>\n",
              "      <td>2020</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>2</td>\n",
              "      <td>India</td>\n",
              "      <td>11,321,280</td>\n",
              "      <td>2020</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>3</td>\n",
              "      <td>Russia</td>\n",
              "      <td>4,389,960</td>\n",
              "      <td>2019</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>4</td>\n",
              "      <td>Indonesia</td>\n",
              "      <td>3,778,134</td>\n",
              "      <td>2020</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>5</td>\n",
              "      <td>Brazil</td>\n",
              "      <td>3,596,841</td>\n",
              "      <td>2020</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-74ed8f27-1770-4db1-9983-2633ef78b0c5')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-74ed8f27-1770-4db1-9983-2633ef78b0c5 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-74ed8f27-1770-4db1-9983-2633ef78b0c5');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "  Rank    Country GDP ( PPP , Peak Year ) millions of USD Peak Year\n",
              "0    1      China                              27,804,953      2020\n",
              "1    2      India                              11,321,280      2020\n",
              "2    3     Russia                               4,389,960      2019\n",
              "3    4  Indonesia                               3,778,134      2020\n",
              "4    5     Brazil                               3,596,841      2020"
            ]
          },
          "execution_count": 11,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "id = int(result[\"matches\"][0][\"id\"])\n",
        "tables[id].head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "P3A47xkzuPRx"
      },
      "source": [
        "The table returned by the Pinecone index indeed has the answer to our query. Now we need a model that can read this table and extract the precise answer."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ssatLvB9BUe3"
      },
      "source": [
        "# Initialize Table Reader"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4dwePzfAvS0t"
      },
      "source": [
        "As the reader, we will use a TAPAS model fine-tuned for the Table QA task. TAPAS is a BERT-like Transformer model pretrained in a self-supervised manner on a large corpus of English language data from Wikipedia. We load the model and tokenizer from the Huggingface model hub into a question-answering pipeline."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 49,
          "referenced_widgets": [
            "234b6430d08d424a8d19153a0f74bc52",
            "f08de790d64a4e5f81705d4cbbf5968d",
            "f453b86b909b40cb89d6636c7b7a5c9d",
            "b4a30af0fbc144b48d295af549ba2d53",
            "82e6aa18d2644f299625377308af5912",
            "ab5b4aa08da64a5ea08cb3ab407a1b88",
            "f7ef382b17294e9a890118bcc55cc879",
            "ea8d8ce27d7746d9bb928fa87dd67610",
            "383da807081440a4956ff80a7093a60c",
            "4338622351e44f06bf6e47d700ffa41b",
            "10dba06df8a0422e9863eb2991922816"
          ]
        },
        "id": "bG2umGVi71CD",
        "outputId": "2bde6f52-f322-4dca-d1de-80a33017c375"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "234b6430d08d424a8d19153a0f74bc52",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Downloading:   0%|          | 0.00/443M [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "from transformers import pipeline, TapasTokenizer, TapasForQuestionAnswering\n",
        "\n",
        "model_name = \"google/tapas-base-finetuned-wtq\"\n",
        "# load the tokenizer and the model from huggingface model hub\n",
        "tokenizer = TapasTokenizer.from_pretrained(model_name)\n",
        "model = TapasForQuestionAnswering.from_pretrained(model_name, local_files_only=False)\n",
        "# load the model and tokenizer into a question-answering pipeline\n",
        "pipe = pipeline(\"table-question-answering\",  model=model, tokenizer=tokenizer, device=device)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "irYNVILy4iAq"
      },
      "source": [
        "Let's run the table returned by the Pinecone index and the query we used before into the question-answering pipeline to extract the answer."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "teqRR63T1gTh",
        "outputId": "f5b25c5d-afce-4be3-99e5-2c16c6b892d9"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'answer': 'China',\n",
              " 'coordinates': [(0, 1)],\n",
              " 'cells': ['China'],\n",
              " 'aggregator': 'NONE'}"
            ]
          },
          "execution_count": 22,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "pipe(table=tables[id], query=query)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FPx5qXw70U0c"
      },
      "source": [
        "The model has precisely answered our query. Let's run some more queries."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GPSC0U6QRuSc"
      },
      "source": [
        "# Querying"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TFpoWLPP49Fo"
      },
      "source": [
        "First, we will define two function to handle our queries and extract answers from tables."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "id": "Dn75qXt7U_hx"
      },
      "outputs": [],
      "source": [
        "def query_pinecone(query):\n",
        "    # generate embedding for the query\n",
        "    xq = retriever.encode([query]).tolist()\n",
        "    # query pinecone index to find the table containing answer to the query\n",
        "    result = index.query(vector=xq, top_k=1)\n",
        "    # return the relevant table from the tables list\n",
        "    return tables[int(result[\"matches\"][0][\"id\"])]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {
        "id": "jfdX3084WHxB"
      },
      "outputs": [],
      "source": [
        "def get_answer_from_table(table, query):\n",
        "    # run the table and query through the question-answering pipeline\n",
        "    answers = pipe(table=table, query=query)\n",
        "    return answers"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 58,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 332
        },
        "id": "-l0-OkC69_N3",
        "outputId": "12aa9a04-61f0-4dcf-83e1-04a89d0d976d"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-dc62f5db-6acf-4ae5-80e4-3e6233b55474\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Manufacturer</th>\n",
              "      <th>Model</th>\n",
              "      <th>Engine</th>\n",
              "      <th>Power Output</th>\n",
              "      <th>Max . Speed ( kph )</th>\n",
              "      <th>Dry Weight ( kg )</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Fiat</td>\n",
              "      <td>805-405</td>\n",
              "      <td>FIAT 1979cc S6 supercharged</td>\n",
              "      <td>130 bhp</td>\n",
              "      <td>220</td>\n",
              "      <td>680</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Alfa Romeo</td>\n",
              "      <td>GPR ( P1 )</td>\n",
              "      <td>Alfa Romeo 1990cc S6</td>\n",
              "      <td>95 bhp</td>\n",
              "      <td>180</td>\n",
              "      <td>850</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Diatto</td>\n",
              "      <td>Tipo 20 S</td>\n",
              "      <td>Diatto 1997cc S4</td>\n",
              "      <td>75 bhp</td>\n",
              "      <td>155</td>\n",
              "      <td>700</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>Bugatti</td>\n",
              "      <td>Type 32</td>\n",
              "      <td>Bugatti 1991cc S8</td>\n",
              "      <td>100 bhp</td>\n",
              "      <td>190</td>\n",
              "      <td>660</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>Voisin</td>\n",
              "      <td>C6 Laboratoire</td>\n",
              "      <td>Voisin 1978cc S6</td>\n",
              "      <td>90 bhp</td>\n",
              "      <td>175</td>\n",
              "      <td>710</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>Sunbeam</td>\n",
              "      <td></td>\n",
              "      <td>Sunbeam 1988cc S6</td>\n",
              "      <td>108 bhp</td>\n",
              "      <td>180</td>\n",
              "      <td>675</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>Mercedes</td>\n",
              "      <td>M7294</td>\n",
              "      <td>Mercedes 1990cc S4 supercharged</td>\n",
              "      <td>120 bhp</td>\n",
              "      <td>180</td>\n",
              "      <td>750</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>Benz</td>\n",
              "      <td>RH Tropfenwagen</td>\n",
              "      <td>Benz 1998cc S6</td>\n",
              "      <td>95 bhp</td>\n",
              "      <td>185</td>\n",
              "      <td>745</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>Miller</td>\n",
              "      <td>122</td>\n",
              "      <td>Miller 1978cc S8</td>\n",
              "      <td>120 bhp</td>\n",
              "      <td>186</td>\n",
              "      <td>850</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-dc62f5db-6acf-4ae5-80e4-3e6233b55474')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-dc62f5db-6acf-4ae5-80e4-3e6233b55474 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-dc62f5db-6acf-4ae5-80e4-3e6233b55474');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "  Manufacturer            Model                           Engine Power Output  \\\n",
              "0         Fiat          805-405      FIAT 1979cc S6 supercharged      130 bhp   \n",
              "1   Alfa Romeo       GPR ( P1 )             Alfa Romeo 1990cc S6       95 bhp   \n",
              "2       Diatto        Tipo 20 S                 Diatto 1997cc S4       75 bhp   \n",
              "3      Bugatti          Type 32                Bugatti 1991cc S8      100 bhp   \n",
              "4       Voisin   C6 Laboratoire                 Voisin 1978cc S6       90 bhp   \n",
              "5      Sunbeam                                 Sunbeam 1988cc S6      108 bhp   \n",
              "6     Mercedes            M7294  Mercedes 1990cc S4 supercharged      120 bhp   \n",
              "7         Benz  RH Tropfenwagen                   Benz 1998cc S6       95 bhp   \n",
              "8       Miller              122                 Miller 1978cc S8      120 bhp   \n",
              "\n",
              "  Max . Speed ( kph ) Dry Weight ( kg )  \n",
              "0                 220               680  \n",
              "1                 180               850  \n",
              "2                 155               700  \n",
              "3                 190               660  \n",
              "4                 175               710  \n",
              "5                 180               675  \n",
              "6                 180               750  \n",
              "7                 185               745  \n",
              "8                 186               850  "
            ]
          },
          "execution_count": 58,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "query = \"which car manufacturers produce cars with a top speed of above 180 kph?\"\n",
        "table = query_pinecone(query)\n",
        "table"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 59,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "M8oG9w6AYlAp",
        "outputId": "411bbb57-0fc1-42d6-8453-6692584c3114"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'answer': 'Fiat, Bugatti, Benz, Miller',\n",
              " 'coordinates': [(0, 0), (3, 0), (7, 0), (8, 0)],\n",
              " 'cells': ['Fiat', 'Bugatti', 'Benz', 'Miller'],\n",
              " 'aggregator': 'NONE'}"
            ]
          },
          "execution_count": 59,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "get_answer_from_table(table, query)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 31,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "50g_9v-uEAPi",
        "outputId": "aece309c-3130-4bc6-a0d1-24b33a844c70"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-985a94cf-fc9c-4e3a-969f-e0c6498f57bd\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Year</th>\n",
              "      <th>Name</th>\n",
              "      <th>Location</th>\n",
              "      <th>Rationale</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1839</td>\n",
              "      <td>Robert Hare</td>\n",
              "      <td>Philadelphia , Pennsylvania</td>\n",
              "      <td>Inventor of the oxy-hydrogen blowpipe</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>1862</td>\n",
              "      <td>John Ericsson</td>\n",
              "      <td>New York , New York</td>\n",
              "      <td>His work improved the field of heat management...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>1865</td>\n",
              "      <td>Daniel Treadwell</td>\n",
              "      <td>Cambridge , Massachusetts</td>\n",
              "      <td>Heat management . He was awarded especially fo...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>1866</td>\n",
              "      <td>Alvan Clark</td>\n",
              "      <td>Cambridge , Massachusetts</td>\n",
              "      <td>Improved refracting telescopes</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>1869</td>\n",
              "      <td>George Henry Corliss</td>\n",
              "      <td>Providence , Rhode Island</td>\n",
              "      <td>For improving the steam engine</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-985a94cf-fc9c-4e3a-969f-e0c6498f57bd')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-985a94cf-fc9c-4e3a-969f-e0c6498f57bd button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-985a94cf-fc9c-4e3a-969f-e0c6498f57bd');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "   Year                  Name                     Location  \\\n",
              "0  1839           Robert Hare  Philadelphia , Pennsylvania   \n",
              "1  1862         John Ericsson          New York , New York   \n",
              "2  1865      Daniel Treadwell    Cambridge , Massachusetts   \n",
              "3  1866           Alvan Clark    Cambridge , Massachusetts   \n",
              "4  1869  George Henry Corliss    Providence , Rhode Island   \n",
              "\n",
              "                                           Rationale  \n",
              "0              Inventor of the oxy-hydrogen blowpipe  \n",
              "1  His work improved the field of heat management...  \n",
              "2  Heat management . He was awarded especially fo...  \n",
              "3                     Improved refracting telescopes  \n",
              "4                     For improving the steam engine  "
            ]
          },
          "execution_count": 31,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "query = \"which scientist is known for improving the steam engine?\"\n",
        "table = query_pinecone(query)\n",
        "table.head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 32,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "tIBhlsF-ZYgI",
        "outputId": "7c0c41e1-5f59-4822-9e2b-bdb424e503e4"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'answer': 'George Henry Corliss',\n",
              " 'coordinates': [(4, 1)],\n",
              " 'cells': ['George Henry Corliss'],\n",
              " 'aggregator': 'NONE'}"
            ]
          },
          "execution_count": 32,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "get_answer_from_table(table, query)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 33,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "uy1dDZSUphKw",
        "outputId": "b3df24b4-0fbd-40b3-98c0-c3c4d1b8c801"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-a83c0554-e109-4f76-907a-43a909cb8156\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Name</th>\n",
              "      <th>Resort Name</th>\n",
              "      <th>Geographic Atoll</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Asdhoo</td>\n",
              "      <td>Asdu Sun Island Resort</td>\n",
              "      <td>North Male Atoll</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Akirifushi</td>\n",
              "      <td>Oblu Select at Sangeli</td>\n",
              "      <td>North Male Atoll</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Baros</td>\n",
              "      <td>Baros Island Resort</td>\n",
              "      <td>North Male Atoll</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>Biyaadhoo</td>\n",
              "      <td>Biyadhoo Island Resort</td>\n",
              "      <td>South Male Atoll</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>Bodubandos</td>\n",
              "      <td>Bandos Maldives Resort</td>\n",
              "      <td>North Male Atoll</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-a83c0554-e109-4f76-907a-43a909cb8156')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-a83c0554-e109-4f76-907a-43a909cb8156 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-a83c0554-e109-4f76-907a-43a909cb8156');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "         Name             Resort Name  Geographic Atoll\n",
              "0      Asdhoo  Asdu Sun Island Resort  North Male Atoll\n",
              "1  Akirifushi  Oblu Select at Sangeli  North Male Atoll\n",
              "2       Baros     Baros Island Resort  North Male Atoll\n",
              "3   Biyaadhoo  Biyadhoo Island Resort  South Male Atoll\n",
              "4  Bodubandos  Bandos Maldives Resort  North Male Atoll"
            ]
          },
          "execution_count": 33,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "query = \"What is the Maldivian island name for Oblu Select at Sangeli\tresort?\"\n",
        "table = query_pinecone(query)\n",
        "table.head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 34,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "jxmUzLlzuZbP",
        "outputId": "bead739a-4187-4764-bbfb-c14f3f1f4c5a"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'answer': 'Akirifushi',\n",
              " 'coordinates': [(1, 0)],\n",
              " 'cells': ['Akirifushi'],\n",
              " 'aggregator': 'NONE'}"
            ]
          },
          "execution_count": 34,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "get_answer_from_table(table, query)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "p7vUy_zP8DQB"
      },
      "source": [
        "As we can see, our Table QA system can retrieve the correct table from the Pinecone index and extract precise answers from the table. The TAPAS model we use supports more advanced queries. It has an aggregation head which indicates whether we need to count, sum, or average cells to answer the questions. Let's run some advanced queries that require aggregation to answer."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 79,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "cIGTeOMJxrGu",
        "outputId": "e9ef6122-c15f-42ab-c325-c1b59e6cced2"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-a20fc2ac-c471-4bc5-8526-26ab08545922\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Rank</th>\n",
              "      <th>Country</th>\n",
              "      <th>GDP ( PPP , Peak Year ) millions of USD</th>\n",
              "      <th>Peak Year</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1</td>\n",
              "      <td>China</td>\n",
              "      <td>27,804,953</td>\n",
              "      <td>2020</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>2</td>\n",
              "      <td>India</td>\n",
              "      <td>11,321,280</td>\n",
              "      <td>2020</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>3</td>\n",
              "      <td>Russia</td>\n",
              "      <td>4,389,960</td>\n",
              "      <td>2019</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>4</td>\n",
              "      <td>Indonesia</td>\n",
              "      <td>3,778,134</td>\n",
              "      <td>2020</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>5</td>\n",
              "      <td>Brazil</td>\n",
              "      <td>3,596,841</td>\n",
              "      <td>2020</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-a20fc2ac-c471-4bc5-8526-26ab08545922')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-a20fc2ac-c471-4bc5-8526-26ab08545922 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-a20fc2ac-c471-4bc5-8526-26ab08545922');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "  Rank    Country GDP ( PPP , Peak Year ) millions of USD Peak Year\n",
              "0    1      China                              27,804,953      2020\n",
              "1    2      India                              11,321,280      2020\n",
              "2    3     Russia                               4,389,960      2019\n",
              "3    4  Indonesia                               3,778,134      2020\n",
              "4    5     Brazil                               3,596,841      2020"
            ]
          },
          "execution_count": 79,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "query = \"what was the total GDP of China and Indonesia in 2020?\"\n",
        "table = query_pinecone(query)\n",
        "table.head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 41,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "O8zFOh8F0ut-",
        "outputId": "978ce1f8-6532-4710-d195-131c45ad6e36"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'answer': 'SUM > 27,804,953, 3,778,134',\n",
              " 'coordinates': [(0, 2), (3, 2)],\n",
              " 'cells': ['27,804,953', '3,778,134'],\n",
              " 'aggregator': 'SUM'}"
            ]
          },
          "execution_count": 41,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "get_answer_from_table(table, query)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vB3s21XJEkpr"
      },
      "source": [
        "Here the QA system suggests the correct cells to add in order to get the total GDP of China and Indonesia in 2020."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 93,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "VsEup2NXAjjK",
        "outputId": "ce29bfb6-22d1-4fcf-b469-877e55b0a406"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-85612087-ccea-4ebf-9fdb-f783d2e968d9\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>CO 2 intensity ( kg/kWh )</th>\n",
              "      <th>Power station</th>\n",
              "      <th>Country</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1.58</td>\n",
              "      <td>Hazelwood Power Station , Victoria closed 31 M...</td>\n",
              "      <td>Australia</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>1.56</td>\n",
              "      <td>Edwardsport IGCC , Edwardsport , Indiana , clo...</td>\n",
              "      <td>United States</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>1.27</td>\n",
              "      <td>Frimmersdorf power plant , Grevenbroich</td>\n",
              "      <td>Germany</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>1.25</td>\n",
              "      <td>HR Milner Generating Station , Grande Cache , ...</td>\n",
              "      <td>Canada</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>1.18</td>\n",
              "      <td>C. TG . Portes Gil , R\u00edo Bravo</td>\n",
              "      <td>Mexico</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-85612087-ccea-4ebf-9fdb-f783d2e968d9')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-85612087-ccea-4ebf-9fdb-f783d2e968d9 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-85612087-ccea-4ebf-9fdb-f783d2e968d9');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "  CO 2 intensity ( kg/kWh )  \\\n",
              "0                      1.58   \n",
              "1                      1.56   \n",
              "2                      1.27   \n",
              "3                      1.25   \n",
              "4                      1.18   \n",
              "\n",
              "                                       Power station        Country  \n",
              "0  Hazelwood Power Station , Victoria closed 31 M...      Australia  \n",
              "1  Edwardsport IGCC , Edwardsport , Indiana , clo...  United States  \n",
              "2            Frimmersdorf power plant , Grevenbroich        Germany  \n",
              "3  HR Milner Generating Station , Grande Cache , ...         Canada  \n",
              "4                     C. TG . Portes Gil , R\u00edo Bravo         Mexico  "
            ]
          },
          "execution_count": 93,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "query = \"what is the average carbon emission of power stations in australia, canada and germany?\"\n",
        "table = query_pinecone(query)\n",
        "table.head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 92,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "pYGooFcrGc8Q",
        "outputId": "67cc0dfc-b825-466c-cee7-978152f5ff69"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'answer': 'AVERAGE > 1.58, 1.27, 1.25',\n",
              " 'coordinates': [(0, 0), (2, 0), (3, 0)],\n",
              " 'cells': ['1.58', '1.27', '1.25'],\n",
              " 'aggregator': 'AVERAGE'}"
            ]
          },
          "execution_count": 92,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "get_answer_from_table(table, query)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HG8lPC0ONWxx"
      },
      "source": [
        "As we can see, the QA system correctly identified which cells to average to answer our question."
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [],
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}